Motivation

The US Wind turbine database provides onshore & offshore wind turbine locations in the United States, corresponding facility information, and turbine technical specifications.

Database could be download from here

The authors of database have created a web application called the US wind turbine database viewer available to the link. The viewer lets you visualize inspect and interact with the turbine database through web browser.

The goal of the analysis is to show the variation of capacity in time and highlight the contribution of each state.

In first part of the report, the total capacity is showed as a function of time and as a function of states.

In the second part, the evolution of total capacity is showed on map. The final plot represents the total capacity by states in three different years 1998, 2008 and 2018.

Load library

library(readr)       # load data in csv format
library(viridis)     # color palettes
library(dplyr)       # manipulate dataframe
library(sf)          # manipulate shape file
library(ggplot2)     # plot
library(forcats)     # reorder factor levels
library(datasets)    # dataset of US state name
library(leaflet)     # interactive maps
library(DT)          # interactive table

Load data

Load the us wind turbine dataset:

usw <- read_csv("uswtdb_v1_1_20180710.csv")

Table

usw_dt <- usw %>%
  select(p_year,t_state,t_cap)
datatable(usw_dt, extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')
  )
)

Total capacity as function of states and years

Calculate the amount of total capacity added for each year:

usw_year <- usw %>%
  filter(t_cap > 0) %>% # remove t_cap negative value (aka missing) 
  filter(p_year >0 ) %>% # remove p_year negative value (aka missing) 
  group_by( p_year) %>%
  summarise(totalcapacity = sum(t_cap)/1000000) #calculate the total capacity in GW

Plot total capacity using the cumsum function inside ggplot to calculate the cumulative value of total capacity:

ggplot(data = usw_year, aes(x = p_year, y = cumsum(totalcapacity))) + geom_line() +
   geom_vline(xintercept = 2006, alpha=0.4,linetype = "dashed" ) +
   annotate("text",x=2003,y=20,label="2006",hjust=0, colour = "gray") +
    labs( x= "", y="Total Capacity [GW]",
       title="US Wind Turbine Total Rated Capacity",
       subtitle = "from 1981 to 2018",
       caption="Source: U.S. Wind Turbine Database")+
    theme_minimal() 

The total capacity has a sharp increase after 2006.

Calculate total capacity in each states:

usw_states <- usw %>%
  filter(t_cap > 0) %>% # remove t_cap negative value (aka missing) 
  filter(p_year >0 ) %>% # remove p_year negative value (aka missing) 
  group_by( t_state) %>%
  summarise(totalcapacity = sum(t_cap)/1000000) #calculate the total capacity in GW

Add state names to usw_states dataset by using state dataset loaded from datasets library:

data(state)
states <- cbind(state.name,state.abb)
states <- rbind(states,c("Guam","GU"))
states <- data.frame(rbind(states,c("Puerto Rico","PR")))

# join by abbreviation name column
usw_states <- left_join(usw_states, states,
                          by=c("t_state"="state.abb"))

Plot the total capacity as function of states:

ggplot(usw_states,
       aes(x=totalcapacity, y=fct_reorder(state.name, totalcapacity, desc=TRUE))) +
  geom_segment(  
           aes(x = 0,
           y=fct_reorder(state.name, totalcapacity, desc=TRUE),
           xend = totalcapacity,
           yend = fct_reorder(state.name, totalcapacity, desc=TRUE)),
           color = "gray50") +
           geom_point() +
  labs(x="Total Rated Capacity [GW]", y="", 
       title = "US Wind Turbine",
       subtitle = "2018",
       caption = "Source: U.S. Wind Turbine Database") +
  theme_minimal() +
  # NEW CODE BELOW
  geom_text(aes(label=round(totalcapacity,1)), hjust=-.5) +
  theme(panel.border = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.line = element_blank(),
        axis.text.x = element_blank())

Texas has the higher value of total capacity, about 23 GW.

Choropleth map of US Wind turbine database

Load US map shape file

The shape file could be download from the Census

Read the shape file of state boundaries:

fifty_location <- "static_maps/data/cb_2017_us_state_20m/cb_2017_us_state_20m.shp"
fifty_states <- st_read(fifty_location)
## Reading layer `cb_2017_us_state_20m' from data source `C:\Users\alessio.passalacqua\Documents\2P\R-DATASCIENCE\corso-analisi\data_journalism\static_maps\data\cb_2017_us_state_20m\cb_2017_us_state_20m.shp' using driver `ESRI Shapefile'
## Simple feature collection with 52 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs

Create dataset for different period

Wind turbine in 1998

Create a dataframe referning to 1998:

usw_tc98 <- usw %>%
  filter(t_cap > 0) %>% # remove t_cap negative (aka missing) 
  filter(p_year >0 & p_year <1999) %>% 
  group_by(t_state) %>%
  summarise(totalcapacity = sum(t_cap)/1000000) #calculate the total capacity in GW

Join shape file and usw in 1998:

usw_tc98 <- left_join(fifty_states, usw_tc98,
                          by=c("STUSPS"="t_state"))
# Create column year and refered to 1998
usw_tc98$Year <- "1998"

Wind turbine in 2008

Create a dataframe referning to 2008:

usw_tc08 <- usw %>%
  filter(t_cap > 0) %>%
  filter(p_year > 0 & p_year < 2009) %>%
  group_by(t_state) %>%
  summarise(totalcapacity = sum(t_cap)/1000000) 

Join shape file and usw in 2008:

usw_tc08 <- left_join(fifty_states, usw_tc08,
                          by=c("STUSPS"="t_state"))

usw_tc08$Year <- "2008"

Wind turbine in 2018

Create a dataframe referning to 2018:

usw_tc18 <- usw %>%
  filter(t_cap > 0) %>%
  filter(p_year > 0) %>%
  group_by(t_state) %>%
  summarise(totalcapacity = sum(t_cap)/1000000)

Join shape file and usw in 2018:

usw_tc18 <- left_join(fifty_states, usw_tc18,
                          by=c("STUSPS"="t_state"))

usw_tc18$Year <- "2018"

Plot Maps

Aggregate all the dataframes by row:

usw_30y <- rbind(usw_tc98,usw_tc08,usw_tc18) 

Remove no contiguos states for more compact visualization:

usw_30y <- usw_30y %>% 
  filter(NAME!="Hawaii" & NAME!="Alaska" & NAME!="Puerto Rico")

Plot maps:

ggplot(usw_30y) +
  geom_sf(aes(fill=totalcapacity)) +
  facet_wrap(~Year,ncol=3) +
  labs(title="Wind Turbine Total Rated Capacity of 48 states",
       subtitle = "for R for Journalists class",
       caption="Source: U.S. Wind Turbine Database")+
  scale_fill_viridis(direction=-1, name="GW") +
  scale_color_viridis(direction=-1) +
  theme_void() +
  theme(panel.grid.major = element_line(colour = 'transparent'))

Interactive choropleth map

usw_tc18$totalcapacity <- ifelse(is.na(usw_tc18$totalcapacity),0,usw_tc18$totalcapacity) 
pal <- colorNumeric("Oranges", domain=usw_tc18$totalcapacity)
popup_sb <- paste0("Total Capacity: ", as.character(round(usw_tc18$totalcapacity,1)))

leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(-98.483330, 38.712046, zoom = 4) %>% 
  addPolygons(data = usw_tc18, 
              fillColor = ~pal(usw_tc18$totalcapacity), 
              fillOpacity = 0.7, 
              weight = 0.2, 
              smoothFactor = 0.2,
              popup = ~popup_sb ) %>%
  addLegend(pal = pal, 
            values = usw_tc18$totalcapacity, 
            position = "bottomright", 
            title = "Total Capacity [GW]")

Connect to Github repository

This lines of code was runned on terminal to connect the local repository to my Github repository


  • git init .
  • git add .
  • git commit -m "first commit"
  • git remote add origin https://github.com/alessiopassalacqua/r-journalism.git
  • git push -u origin master